AITopics | importance weight

Token-Level Self-Play with Importance-Aware Guidance for Large Language Models

Neural Information Processing SystemsJun-21-2026, 20:22:43 GMT

Leveraging the power of Large Language Models (LLMs) through preference optimization is crucial for aligning model outputs with human values. Direct Preference Optimization (DPO) has recently emerged as a simple yet effective method by directly optimizing on preference data without the need for explicit reward models. However, DPO typically relies on human-labeled preference data, which can limit its scalability. Self-Play Fine-Tuning (SPIN) addresses this by allowing models to generate their own rejected samples, reducing the dependence on human annotations. Nevertheless, SPIN uniformly applies learning signals across all tokens, ignoring the fine-grained quality variations within responses. As the model improves, rejected samples increasingly contain high-quality tokens, making the uniform treatment of tokens suboptimal. In this paper, we propose SWIFT (Self-Play Weighted Fine-Tuning), a fine-grained self-refinement method that assigns token-level importance weights estimated from a stronger teacher model. Beyond alignment, we also demonstrate that SWIFT serves as an effective knowledge distillation strategy by using the teacher not for logits matching, but for reward-guided token weighting. Extensive experiments on diverse benchmarks and settings demonstrate that SWIFT consistently surpasses both existing alignment approaches and conventional knowledge distillation methods.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)
Instructional Material (0.67)
Overview (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Energy-based generatormatching: A neural sampler for general state space

Neural Information Processing SystemsJun-20-2026, 19:51:51 GMT

We propose Energy-based generator matching (EGM), a modality-agnostic approach to train generative models from energy functions in the absence of data. Extending the recently proposed generator matching, EGM enables training of arbitrary continuous-time Markov processes, e.g., diffusion, flow, and jump, and can generate data from continuous, discrete, and a mixture of two modalities. To this end, we propose estimating the generator matching loss using self-normalized importance sampling with an additional bootstrapping trick to reduce variance in the importance weight.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
(2 more...)

Add feedback

Conformal Candidate Certification for Offline Model-Based Optimization

Choi, Seungjin

arXiv.org Machine LearningJun-16-2026

Offline model-based optimization (MBO) proposes candidates by optimizing a surrogate trained on a fixed historical dataset. Because candidates are deliberately out-of-distribution, surrogate rankings are least reliable exactly where the optimizer is most aggressive, yet existing methods provide no per-candidate statistical certificate that a design meets a target threshold. We propose \emph{Conformal Candidate Certification} (CCC), a post-hoc wrapper that attaches a calibrated one-sided lower bound to each candidate and advances only those whose bound exceeds the target. We show that entropy-regularized surrogate maximization induces a Gibbs-tilted proposal, so the same surrogate supplies importance weights for weighted conformal prediction without a separate density-ratio estimation step. In a controlled synthetic study, CCC certifies $16.7\%$ of an aggressive proposal pool with empirical coverage 0.990 at nominal 0.90, while standard conformal prediction ignoring the covariate shift collapses to 0.416 coverage.

artificial intelligence, conformal candidate certification, machine learning, (9 more...)

arXiv.org Machine Learning

2606.15217

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

f31bf160569618084ba9bdc2a8de29d0-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 07:10:32 GMT

machine learning, reinforcement learning, trajectory, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

4476b929e30dd0c4e8bdbcc82c6ba23a-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 15:45:37 GMT

artificial intelligence, estimator, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.93)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.92)
Information Technology > Data Science (0.67)

Add feedback

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning

Neural Information Processing SystemsApr-25-2026, 15:45:33 GMT

Importance Sampling (IS) is a widely used building block for a large variety of off-policy estimation and learning algorithms. However, empirical and theoretical studies have progressively shown that vanilla IS leads to poor estimations whenever the behavioral and target policies are too dissimilar. In this paper, we analyze the theoretical properties of the IS estimator by deriving a novel anticoncentration bound that formalizes the intuition behind its undesired behavior. Then, we propose a new class of IS transformations, based on the notion of power mean. To the best of our knowledge, the resulting estimator is the first to achieve, under certain conditions, two key properties: (i) it displays a subgaussian concentration rate; (ii) it preserves the differentiability in the target distribution. Finally, we provide numerical simulations on both synthetic examples and contextual bandits, in comparison with off-policy evaluation and learning baselines.

artificial intelligence, estimator, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)

Add feedback

148bbc25b934211d80435b5cad5a7198-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 04:00:57 GMT

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Data Science > Data Mining (0.46)

Add feedback

21c86d5b10cdc28664ccdadf0a29065a-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 01:35:42 GMT

artificial intelligence, machine learning, mcmc, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Neural Information Processing SystemsApr-24-2026, 22:26:44 GMT

Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to "good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

BR-SNIS: Bias Reduced Self-Normalized Importance Sampling

Neural Information Processing SystemsApr-24-2026, 08:56:23 GMT

Importance Sampling (IS) is a method for approximating expectations under a target distribution using independent samples from a proposal distribution and the associated importance weights. In many applications, the target distribution is known only up to a normalization constant, in which case self-normalized IS (SNIS) can be used. While the use of self-normalization can have a positive effect on the dispersion of the estimator, it introduces bias. In this work, we propose a new method, BR-SNIS, whose complexity is essentially the same as that of SNIS and which significantly reduces bias without increasing the variance. This method is a wrapper in the sense that it uses the same proposal samples and importance weights as SNIS, but makes clever use of iterated sampling-importance resampling (i-SIR) to form a bias-reduced version of the estimator. We furnish the proposed algorithm with rigorous theoretical results, including new bias, variance and high-probability bounds, and these are illustrated by numerical examples.

artificial intelligence, estimator, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Filters

Collaborating Authors

importance weight

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Token-Level Self-Play with Importance-Aware Guidance for Large Language Models

Energy-based generatormatching: A neural sampler for general state space

Conformal Candidate Certification for Offline Model-Based Optimization

f31bf160569618084ba9bdc2a8de29d0-Paper-Conference.pdf

4476b929e30dd0c4e8bdbcc82c6ba23a-Supplemental.pdf

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning

148bbc25b934211d80435b5cad5a7198-Paper-Conference.pdf

21c86d5b10cdc28664ccdadf0a29065a-Paper-Conference.pdf

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

BR-SNIS: Bias Reduced Self-Normalized Importance Sampling